Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Quadratic performance indices associated with linear plants offer simplicity and lead to linear feedback control laws, but they may not adequately capture the complexity and flexibility required to address various practical control problems. One notable example is to improve, by using possibly nonlinear laws, on the trade-off between rise time and overshoot commonly observed in classical regulator problems with linear feedback control laws. To address these issues, non-quadratic terms can be introduced into the performance index, resulting in nonlinear control laws. In this study, we tackle the challenge of solving optimal control problems with non-quadratic performance indices using the closed-loop neighboring extremal optimal control (NEOC) approach and homotopy method. Building upon the foundation of the Linear Quadratic Regulator (LQR) framework, we introduce a parameter associated with the non-quadratic terms in the cost function, which is continuously adjusted from 0 to 1. We propose an iterative algorithm based on a closed-loop NEOC framework to handle each gradual adjustment. Additionally, we discuss and analyze the classical work of Bass and Webber, whose approach involves including additional non-quadratic terms in the performance index to render the resulting Hamilton-Jacobi equation analytically solvable. Our findings are supported by numerical examples.more » « less
-
This paper proposes a novel approach that enables a robot to learn an objective function incrementally from human directional corrections. Existing methods learn from human magnitude corrections; since a human needs to carefully choose the magnitude of each correction, those methods can easily lead to over-corrections and learning inefficiency. The proposed method only requires human directional corrections — corrections that only indicate the direction of an input change without indicating its magnitude. We only assume that each correction, regardless of its magnitude, points in a direction that improves the robot’s current motion relative to an unknown objective function. The allowable corrections satisfying this assumption account for half of the input space, as opposed to the magnitude corrections which have to lie in a shrinking level set. For each directional correction, the proposed method updates the estimate of the objective function based on a cutting plane method, which has a geometric interpretation. We have established theoretical results to show the convergence of the learning process. The proposed method has been tested in numerical examples, a user study on two human-robot games, and a real-world quadrotor experiment. The results confirm the convergence of the proposed method and further show that the method is significantly more effective (higher success rate), efficient/effortless (less human corrections needed), and potentially more accessible (fewer early wasted trials) than the state-of-the-art robot learning frameworks.more » « less
-
This paper develops the method of Continuous Pontryagin Differentiable Programming (Continuous PDP), which enables a robot to learn an objective function from a few sparsely demonstrated keyframes. The keyframes, labeled with some time stamps, are the desired task-space outputs, which a robot is expected to follow sequentially. The time stamps of the keyframes can be different from the time of the robot’s actual execution. The method jointly finds an objective function and a time-warping function such that the robot’s resulting trajectory sequentially follows the keyframes with minimal discrepancy loss. The Continuous PDP minimizes the discrepancy loss using projected gradient descent, by efficiently solving the gradient of the robot trajectory with respect to the unknown parameters. The method is first evaluated on a simulated robot arm and then applied to a 6-DoF quadrotor to learn an objective function for motion planning in unmodeled environments. The results show the efficiency of the method, its ability to handle time misalignment between keyframes and robot execution, and the generalization of objective learning into unseen motion conditions.more » « less
An official website of the United States government
